Search Result

Select

Recommending clone refactoring method based on decision tree

SHE Rongrong, ZHANG Liping, HOU Min, YAN Sheng

Journal of Computer Applications 2018, 38 (7): 2037-2043. DOI: 10.11772/j.issn.1001-9081.2017122997

Abstract （404）

PDF （1208KB）（241）

Save

Aiming at long-term software maintenance even introduction of errors due to extensive use of cloned code, a classifier based on decision tree was proposed to recommend clone for refactoring. Firstly, clone detection was performed using NiCad. Secondly, the features related to cloning relationship, cloned code segment and clonal context were collected. Thirdly, a decision tree classifier was used for training. Finally, the classification results were evaluated by K-fold crossover. The experiments were conducted on nearly 600 clones in five kinds of open-source software. The experimental results show that the proposed method achieves 80% accuracy when recommending clonal refactoring instances for each target system.

Reference | Related Articles | Metrics

Select

Improved pedestrian detection method based on convolutional neural network

XU Chao, YAN Shengye

Journal of Computer Applications 2017, 37 (6): 1708-1715. DOI: 10.11772/j.issn.1001-9081.2017.06.1708

Abstract （597）

PDF （1327KB）（1151）

Save

In order to choose better model and acquire more accurate bounding-box when using the Convolutional Neural Network (CNN) in pedestrian detection, an improved pedestrian detection method based on CNN was proposed. The improvements include two aspects:how to determine the iterative learning number of training CNN samples and how to merge multiple responses of an object. Firstly, on the solution of the first improvement, multiple candidate CNN classifiers were learned from different training samples in different training iterations. And a new strategy was proposed to select the model with better generalization ability. Both the accuracy on the validation set and the stability of the accuracies during the iterative training procedure were considered by the proposed strategy. On the improvement of combining multiple responses, an enhanced refined bounding-box combination method was proposed which was different from the Non-Maximum Suppression (NMS) method. The coarse bounding-box of CNN detection procedure output was taken as the input for obtaining the one-to-one refined bounding-box. Then, the CNN accurate positioning process was used for each coarse bounding-box to get the corresponding refined bounding-box. Finally, the multiple refined bounding-boxes were merged by considering the correction probability of each bounding-box. Exactly, the final output bounding-box was obtained by the weighted average of multiple relevant refined bounding boxes with respect to their correction probabilities. To investigate the proposed two improvements, the comprehensive experiments were conducted on well-recognized pedestrian detection benchmark dataset-ETH. The experimental results show that, the two proposed improvements have effectively improved the detection performance of the system. Compared with the benchmark method of Fast Region proposals with CNN (R-CNN), the detection performance of the proposed method with the fusion of two improvements has greatly improved by 5.06 percentage points under the same test conditions.

Reference | Related Articles | Metrics

Select

Feature selection model for harmfulness prediction of clone code

WANG Huan, ZHANG Liping, YAN Sheng, LIU Dongsheng

Journal of Computer Applications 2017, 37 (4): 1135-1142. DOI: 10.11772/j.issn.1001-9081.2017.04.1135

Abstract （405）

PDF （1468KB）（410）

Save

To solve the problem of irrelevant and redundant features in harmfulness prediction of clone code, a combination model for harmfulness feature selection of code clone was proposed based on relevance and influence. Firstly, a preliminary sorting for the correlation of feature data was proceeded by the information gain ratio, then the features with high correlation was preserved and other irrelevant features were removed to reduce the search space of features. Next, the optimal feature subset was determined by using the wrapper sequential floating forward selection algorithm combined with six kinds of classifiers including Naive Bayes and so on. Finally, the different feature selection methods were analyzed, and feature data was analyzed, filtered and optimized by using the advantages of various methods in different selection critera. Experimental results show that the prediction accuracy is increased by15.2-34 percentage pointsafter feature selection; and compared with other feature selection methods, F1-measure of this method is increased by 1.1-10.1 percentage points, and AUC measure is increased by 0.7-22.1 percentage points. As a result, this method can greatly improve the accuracy of harmfulness prediction model.

Reference | Related Articles | Metrics

Select

Solution for classification imbalance in harmfulness prediction of clone code

WANG Huan, ZHANG Liping, YAN Sheng

Journal of Computer Applications 2016, 36 (12): 3468-3475. DOI: 10.11772/j.issn.1001-9081.2016.12.3468

Abstract （512）

PDF （1160KB）（328）

Save

Focusing on the problem of imbalanced classification of harmful data and harmless data in the prediction of the harmful effects of clone code, a K-Balance algorithm based on Random Under-Sampling (RUS) was proposed, which could adjust the classification imbalance automatically. Firstly, a sample data set was constructed by extracting static features and evolution features of clone code. Then, a new data set of imbalanced classification with different proportion was selected. Next, the harmful prediction was carried out to the new selected data set. Finally, the most suitable percentage value of classification imbalance was chosen automatically by observing the different performance of the classifier. The performance of the harmfulness prediction model of clone code was evaluated with seven different types of open-source software systems containing 170 versions written in C language. Compared with the other classification imbalance solution methods, the experimental results show that the proposed method is increased by 2.62 percentage points to 36.7 percentage points in the classification prediction effects (Area Under ROC(Receive Operating Characteristic) Curve (AUC)) of harmful and harmless clones. The proposed method can improve the classification imbalance prediction effectively.

Reference | Related Articles | Metrics

Select

Harmfulness prediction of clone code based on Bayesian network

ZHANG Liping, ZHANG Ruixia, WANG Huan, YAN Sheng

Journal of Computer Applications 2016, 36 (1): 260-265. DOI: 10.11772/j.issn.1001-9081.2016.01.0260

Abstract （467）

PDF （875KB）（412）

Save

During the process of software development, activities of programmers including copy and paste result in a lot of code clones. However, the inconsistent code changes are always harmful to the programs. To solve this problem, and find harmful code clones in programs effectively, a method was proposed to predict harmful code clones by using Bayesian network. First, referring to correlation research on software defects prediction and clone evolution, two software metrics including static metrics and evolution metrics were proposed to characterize the features of clone codes. Then the prediction model was constructed by using core algorithm of Bayesian network. Finally, the probability of harmful code clones occurrence was predicted. Five different types of open-source software system containing 99 versions written in C languages were tested to evaluate the prediction model. The experimental results show that the proposed method can predict harmfulness for clones with better applicability and higher accuracy, and further reduce the threat of harmful code clones while improving software quality.

Reference | Related Articles | Metrics

Select

Matrix-structural fast learning of cascaded classifier for negative sample inheritance

LIU Yang, YAN Shengye, LIU Qingshan

Journal of Computer Applications 2015, 35 (9): 2596-2601. DOI: 10.11772/j.issn.1001-9081.2015.09.2596

Abstract （438）

PDF （930KB）（324）

Save

Due to the disadvantages such as inefficiency of getting high-quality samples, bad impact of bootstrap to the whole learning-efficiency and final classifier performance in the negative samples bootstrap process of matrix-structural learning of cascade classifier algorithm. This paper proposed a fast learning algorithm-matrix-structural fast learning of cascaded classifier for negative sample inheritance. The negative sample bootstrap process of this algorithm combined sample inheritance and gradation bootstrap, which inherited helpful samples from the negative sample set used by last training stage firstly, and then got insufficient part of sample set from the negative image set. Sample inheritance reduced the bootstrap range of useful samples, which accelerated bootstrap. And sample pre-screening, during bootstrap process, increased sample complexity and promoted final classifier performance. The experiment results show that the proposed algorithm saves 20h in training time and improves 1 percentage point in detection performance, compared with matrix-structural learning of cascaded classifier algorithm. Besides, compared with other 17 human detection algorithms, the proposed algorithm achieves good performance too. The proposed algorithm gets great improvement in training efficiency and detection performance compared with matrix-structural learning of cascaded classifier algorithm.

Reference | Related Articles | Metrics